Global Speaker Clustering towards Optimal Stopping Criterion in Binary Key Speaker Diarization
نویسندگان
چکیده
The recently proposed speaker diarization technique based on binary keys provides a very fast alternative to state-of-the-art systems with little increase of Diarization Error Rate (DER). Although the approach shows great potential, it also presents issues, mainly in the stopping criterion. Therefore, exploring alternative clustering/stopping criterion approaches is needed. Recently some works have addressed the speaker clustering as a global optimization problem in order to tackle the intrinsic issues of the Agglomerative Hierarchical Clustering (AHC) (mainly the local-maximum-based decision making). This paper aims at adapting and applying this new framework to the binary key diarization system. In addition, an analysis of cluster purity across the AHC iterations is done using reference speaker ground-truth labels to select the purer clustering as input for the global framework. Experiments on the REPERE phase 1 test database show improvements of around 6% absolute DER compared to the baseline system output.
منابع مشابه
Speaker Attribution of Australian Broadcast News Data
Speaker attribution is the task of annotating a spoken audio archive based on speaker identities. This can be achieved using speaker diarization and speaker linking. In our previous work, we proposed an efficient attribution system, using complete-linkage clustering, for conducting attribution of large sets of two-speaker telephone data. In this paper, we build on our proposed approach to achie...
متن کاملTowards a Better Integration of Written Names for Unsupervised Speakers Identification in Videos
Existing methods for unsupervised identification of speakers in TV broadcast usually rely on the output of a speaker diarization module and try to name each cluster using names provided by another source of information: we call it “late naming”. Hence, written names extracted from title blocks tend to lead to high precision identification, although they cannot correct errors made during the clu...
متن کاملNovel clustering selection criterion for fast binary key speaker diarization
Speaker diarization has become an important building block in many speech-related systems. Given the great increase of audiovisual media, fast systems are required in order to process large amounts of data in a reasonable time. In this regard, the recently proposed speaker diarization system based on binary key speaker modeling provides a very fast alternative to state-of-the-art systems at the...
متن کاملA robust stopping criterion for agglomerative hierarchical clustering in a speaker diarization system
Agglomerative hierarchical clustering (AHC) is an unsupervised classification strategy of merging the closest pair of clusters recursively, and has been widely used in speaker diarization systems to classify speech segments by speaker identity. The most critical part in AHC is how to automatically stop the recursive process at the point when clustering error rate reaches its lowest possible val...
متن کاملTowards a complete binary key system for the speaker diarization task
Speaker diarization is the task of partitioning an audio stream into homogeneous segments according to speaker identity. Today state-of-the-art speaker diarization systems have achieved very competitive performance. However, any small improvement in Diarization Error Rate (DER) is usually subject to very large processing times (real time factor above one), which makes systems not suitable for s...
متن کامل